PISA 2012 Project - COMMUNICATE DATA FINDINGS

PROJECT 5

MAY 16-2020

NIMMY GEORGE

Data Analyst Nanodegree

Introduction

PISA may be a survey of students' skills and knowledge as they approach the top of compulsory education. it's not a conventional school test. instead of examining how well students have learned the varsity curriculum, it's at how well prepared they're for all times beyond school. Around 510,000 students in 65 economies took part within the PISA 2012 assessment of reading, mathematics and science representing about 28 million 15-year-olds globally. Of those economies, 44 took part in an assessment of creative problem solving and 18 in an assessment of monetary literacy.

FEATURES USED FOR THIS PROJECT

1.Country

2.Gender

3.Overall_score: student's overall score of maths

4.Motivation: Giving motivation for studying and to get a future job

5.Anxiety: Helping to control their anxiety

6.Interest:To know their interest.

7.Work_ethic: Students work which include homeworks are viewed to know whether they complete it on time or done with or without ethics.

8.Parents: student's perceived view of parent's attitude towards mathematics is vital.

In [1]:
import numpy
import pandas 
import matplotlib.pyplot as plt
import seaborn 

%matplotlib inline
plt.style.use('fivethirtyeight')
In [2]:
data = pandas.read_csv('Desktop/pisa2012_clean.csv')
data.head()
Out[2]:
country gender score_overall motivation anxiety interest work_ethic parents behavior self
0 Albania Female 406.8469 3.50 2.6 3.00 3.000000 3.000000 1.75 2.8
1 Albania Female 486.1427 2.75 NaN 2.25 3.222222 2.666667 2.50 NaN
2 Albania Female 533.2684 4.00 NaN 3.25 3.888889 3.666667 2.00 NaN
3 Albania Female 412.2215 NaN NaN NaN NaN NaN NaN 2.6
4 Albania Female 381.9209 4.00 3.2 2.50 3.777778 3.666667 NaN 2.4
In [3]:
print(data.shape,'\n')
print(data.info())
(485490, 10) 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 485490 entries, 0 to 485489
Data columns (total 10 columns):
country          485490 non-null object
gender           485490 non-null object
score_overall    485490 non-null float64
motivation       312694 non-null float64
anxiety          307425 non-null float64
interest         311611 non-null float64
work_ethic       306731 non-null float64
parents          311682 non-null float64
behavior         307112 non-null float64
self             306931 non-null float64
dtypes: float64(8), object(2)
memory usage: 40.7+ MB
None
In [4]:
data[['country','gender']]=data[['country','gender']].astype('category')
In [5]:
print(data.describe())
       score_overall     motivation        anxiety       interest  \
count  485490.000000  312694.000000  307425.000000  311611.000000   
mean      469.621653       2.988886       2.495314       2.445061   
std       103.265391       0.740661       0.683011       0.779287   
min        19.792800       1.000000       1.000000       1.000000   
25%       395.318600       2.500000       2.000000       2.000000   
50%       466.201900       3.000000       2.400000       2.500000   
75%       541.057800       3.500000       3.000000       3.000000   
max       962.229300       4.000000       4.000000       4.000000   

          work_ethic        parents       behavior           self  
count  306731.000000  311682.000000  307112.000000  306931.000000  
mean        2.866265       3.066485       1.677428       2.482890  
std         0.577228       0.610225       0.557671       0.742968  
min         1.000000       1.000000       1.000000       1.000000  
25%         2.555556       2.666667       1.250000       2.000000  
50%         2.888889       3.000000       1.500000       2.400000  
75%         3.222222       3.666667       1.875000       3.000000  
max         4.000000       4.000000       4.000000       4.000000  
What is the structure of the given dataset?

We can see that there are 485,490 students within the dataset with 17 features (country, gender, overall math score, motivation, anxiety, interest, work ethic, behavior, self, and parents). Country and Gender are of category type and therefore the remainder of the features are numeric.So we can say that these are described in the given dataset of csv file.

What are the most feature(s) of interest in your dataset?

From my analysis, I'm curious about seeing the consequences of gender has on students' mathematical skills. I wanted to require under consideration the students' math scores, but also the scholars and parent's feelings towards Mathematics.

I might also wish to know if the Mathematic Skills and Attitudes between Gender across different countries also if a student's perceived attitude of their parents differ between gender.

What features within the dataset does one think will help support your investigation into your features of interest?

I expect that there'll be a better overall test and subsection score among males compared thereto of females. For test scores, we'll be watching the score_overall. I also expect there to be a better attitude towards mathematics among males compared thereto of the females. For attitudes, we'll be watching the subsequent features: motivation, anxiety, interest, work_ethic, behavior, and self. All attitude features, with the exception of 'anxiety', show that the upper the score is, the higher the student's attitude is towards Mathematics.

EXPLORATION

SCORES

In [6]:
#Overall Score plotting
binsize=50
bins=numpy.arange(0,data['score_overall'].max()+binsize,binsize)

plt.figure(figsize=[12,6])
plt.hist(data= data,x='score_overall',bins=bins)
plt.title('OVERALL SCORES OF MATH', size=20)##title
plt.xlabel('Score of Math')## xaxis
plt.ylabel('No of Students');## yaxis

plt.savefig('hist_score_overall.png')
In [7]:
##Plotting pie graph
plt.pie(data['gender'].value_counts(), labels=data['gender'].value_counts().index,startangle=90,counterclock=False)
plt.axis('square')
plt.title('GENDER of STUDENTS');
In [8]:
print('Male Students: {:.4f}%'.format(data['gender'].value_counts()['Male']/data.shape[0]))##Getting the percentage of male students
print('Female Students: {:.4f}%'.format(data['gender'].value_counts()['Female']/data.shape[0]))##Getting the percentage of female students
Male Students: 0.4952%
Female Students: 0.5048%

COUNTRY

In [9]:
order= data['country'].value_counts().index
##plotting the graph to get no of students on the basis of country they are from
plt.figure(figsize=[20,20])
seaborn.countplot(data=data,y='country',order=order)
plt.title(' STUDENTS by COUNTRY',size=20);
In [10]:
##plotting the box graph
plt.figure(figsize=(20,8))
seaborn.boxplot(data['country'].value_counts(), color='red')
seaborn.swarmplot(data['country'].value_counts(), color='green')
plt.title('STUDENTS by COUNTRY', size=20);

Most countries have between 5000ish to 7000ish students taking the survey. The boxplot also shows that there are a couple of outliers. for instance , Italy and Mexico have over 30,000 students and Liechtenstein has well below 1000 students.have over 30,000 students and Liechtenstein has well below 1000 students.

In [11]:
##For describtion
data['country'].value_counts().describe()
Out[11]:
count       65.000000
mean      7469.076923
std       6125.192028
min        293.000000
25%       4743.000000
50%       5231.000000
75%       6856.000000
max      33806.000000
Name: country, dtype: float64

ATTITUDE

In [12]:
#Getting the attitude section
attitude=['motivation',  'work_ethic','interest', 'behavior','anxiety', 'self']

fig, ax= plt.subplots(nrows=2,ncols=3,figsize=[20,12])

##binsize can be found by 1/no of sections
binsizes=[1/4,1/5,1/4,1/9,1/8,1/5]

ax=ax.flatten()
i=0

for feature in attitude:
    bins=numpy.arange(min(data[feature]), max(data[feature]) +binsizes[i], binsizes[i])
    ax[i].hist(data=data, x=feature, bins=bins)
    ax[i].set_xlabel('Scores')##xlabel
    ax[i].set_ylabel('No of Students')##ylabel
    ax[i].set_title(feature)##title
    i+=1
  • under the motivation feature, it's just like the graph is slightly skewed to the left with spikes at points 3-3.25 and 3.75-4.

  • anxiety is nearly normally distributed

  • work_ethic is slightly skewed to the left with a spike at around 3 points.

  • behaviour is skewed to the proper showing that the majority students don't take extra actions towards Mathematics like taking about math with friends, play chess, or computer programming

PARENTS

In [13]:
##To get the parental attitudes
binsize=1/3
bins=numpy.arange(data['parents'].min(),data['parents'].max() +binsize, binsize)

plt.hist(data=data, x='parents', bins=bins)
plt.xlabel('score')##xlabel
plt.ylabel('number of students')##ylabel
plt.title('STUDENT VIEW FOR PARENTAL ATTITUDE TOWARDS MATH');##TITLE

What if we now just take a glance at just the highest percentile of scholars and see how gender is distributed?

In [14]:
##getting the percent of datas of scores with 90,95 and 99
data75=data.query('score_overall>=@data.score_overall.quantile(.75)')
data90=data.query('score_overall>=@data.score_overall.quantile(.90)')
data95=data.query('score_overall>=@data.score_overall.quantile(.95)')
data99=data.query('score_overall>=@data.score_overall.quantile(.99)')
In [15]:
##to draw piegraph based on percentages of 75,90,95 and 99
percent=[75,90,95,99]
fig, ax=plt.subplots(nrows=2, ncols=2, figsize=(10,10))

ax=ax.flatten()

for i in range(4):
    plt.sca(ax[i])
    datae=eval('data'+str(percent[i]))
    plt.pie(datae['gender'].value_counts(), labels=datae['gender'].value_counts().index,startangle=90,counterclock=False,autopct='%.3f')
    plt.axis('square')
    plt.title('Top '+ str(percent[i])+'th percentage by gender')

GENDER EACH BY COUNTRY

In [16]:
order= data['country'].value_counts().index
##getting the plot on no of students on the basis of their gender and country from where they are.
plt.figure(figsize=[20,40])
seaborn.countplot(data=data,y='country',order=order,hue='gender')
plt.title('Number of Students by Country with Gender',size=35);

While most countries show an almost equal split between gender, there are some countries like Mexico, Thailand and Israel that show an very uneven split. Let's take a better check out this

In [17]:
##sorting on the basis of country
country=data['country'].unique().tolist()
country.sort()
In [18]:
##grouping female and male students on the basis of their country
female=data.query('gender=="Female"').groupby('country').size()
male=data.query('gender=="Male"').groupby('country').size()
In [19]:
country==female.index.tolist()==male.index.tolist()
Out[19]:
True
In [20]:
datagender=pandas.DataFrame({'country': country, 'female': female.values,'male':male.values})
In [21]:
##To get the female and male percent based on their cointries
datagender['total_pop']=datagender['female']+datagender['male']
datagender['female_percent']=100*(datagender['female']/datagender['total_pop'])
datagender['male_percent']=100*(datagender['male']/datagender['total_pop'])
datagender['diff_percent']=(numpy.absolute((datagender['female_percent']-datagender['male_percent'])))
In [22]:
datagender.head(10)##displayinh 1st 10 values
Out[22]:
country female male total_pop female_percent male_percent diff_percent
0 Albania 2416 2327 4743 50.938225 49.061775 1.876450
1 Argentina 3113 2795 5908 52.691266 47.308734 5.382532
2 Australia 7075 7406 14481 48.857123 51.142877 2.285754
3 Austria 2357 2398 4755 49.568875 50.431125 0.862250
4 Belgium 4287 4310 8597 49.866232 50.133768 0.267535
5 Brazil 10175 9029 19204 52.983753 47.016247 5.967507
6 Bulgaria 2578 2704 5282 48.807270 51.192730 2.385460
7 Canada 10943 10601 21544 50.793724 49.206276 1.587449
8 Chile 3512 3344 6856 51.225204 48.774796 2.450408
9 China-Shanghai 2637 2540 5177 50.936836 49.063164 1.873672
In [23]:
datagender['diff_percent'].describe()##to getthe diff_percent by gender wise
Out[23]:
count    65.000000
mean      2.978123
std       2.855713
min       0.042159
25%       0.862250
50%       1.889100
75%       4.617358
max      13.109295
Name: diff_percent, dtype: float64
In [24]:
##plotting swarmplot
seaborn.swarmplot(data=datagender,x='diff_percent',color='red')
seaborn.boxplot(data=datagender, x='diff_percent',color='blue')
plt.title('PERCENT DIFFERENCE of COUNTRY by GENDER')##giving the title
plt.xlabel('percent');##xlabel

Almost half the countries have a difference in gender of about 2% and lower for college students . It even seems like two countries have a difference of quite 10%. Why is there such an outsized gap in gender for college students taking these exams for these countries?

In [25]:
datagender.sort_values('diff_percent',ascending=False).head(10)##sorting based on the diff_percent
Out[25]:
country female male total_pop female_percent male_percent diff_percent
57 Thailand 3736 2870 6606 56.554647 43.445353 13.109295
26 Israel 2825 2230 5055 55.885262 44.114738 11.770524
53 Slovenia 2699 3212 5911 45.660633 54.339367 8.678735
58 Tunisia 2390 2017 4407 54.231904 45.768096 8.463808
21 Hong Kong-China 2161 2509 4670 46.274090 53.725910 7.451820
31 Korea 2342 2691 5033 46.532883 53.467117 6.934234
12 Costa Rica 2460 2142 4602 53.455020 46.544980 6.910039
64 Vietnam 2648 2311 4959 53.397862 46.602138 6.795725
63 Uruguay 2826 2489 5315 53.170273 46.829727 6.340546
5 Brazil 10175 9029 19204 52.983753 47.016247 5.967507

Here we can see that females are dominated

OVERALL SCORES

In [26]:
binsize=25
bins=numpy.arange(min(data['score_overall']),max(data['score_overall'])+binsize, binsize)
##plotting
plt.title('Overall Score of Students')
plt.hist(data=data.query('gender=="Female"'),x='score_overall', alpha=.4,bins=bins,label='Female')##querying female data
plt.hist(data=data.query('gender=="Male"'),x='score_overall', alpha=.4,bins=bins, label='Male')##querying male data
plt.legend();

PARENTAL ATTITUDE TOWARDS MATHEMATICS

In [27]:
seaborn.boxplot(data=data, y='parents', x='gender')##plot of parental view
;
Out[27]:
''
In [28]:
#3getting the describtion of parental view
data.query('gender=="Female"').parents.describe(), data.query('gender=="Male"').parents.describe()
Out[28]:
(count    158373.000000
 mean          3.036302
 std           0.610329
 min           1.000000
 25%           2.666667
 50%           3.000000
 75%           3.666667
 max           4.000000
 Name: parents, dtype: float64, count    153309.000000
 mean          3.097665
 std           0.608550
 min           1.000000
 25%           2.666667
 50%           3.000000
 75%           3.666667
 max           4.000000
 Name: parents, dtype: float64)
In [29]:
##plotting the histogram
plt.hist(data.query('gender=="Female"')['parents'],bins=numpy.arange(1,4+1/3,1/3), label='Female', alpha=.4)##getting female data
plt.hist(data.query('gender=="Male"')['parents'],bins=numpy.arange(1,4+1/3,1/3), label='Male', alpha=.4)##getting male data
plt.legend()
plt.title('Parental View Of Mathematics On The Basis Of Gender')##title
plt.xlabel('score')##xlabel
plt.ylabel('no of students');##ylabel
In [59]:
##plotting density curve diagram
seaborn.kdeplot(data=data.query('gender=="Female"')['parents'], shade=True, color='green', bw=1/4, label='Female') ##getting female data
seaborn.kdeplot(data=data.query('gender=="Male"')['parents'], shade=True, color='red', bw=1/4, label='Male') ##getting male data
plt.title('Parental View Of Mathematics On The Basis Of Gender')##title
plt.xlabel('score') ##xlabel
plt.ylabel('number of students'); ##ylabel
In [31]:
#Attitude of students
attitude=['motivation', 'anxiety','interest', 'work_ethic', 'behavior', 'self']

fig, ax= plt.subplots(nrows=2,ncols=3,figsize=[20,12])

#Determining binsizes
#binsizes = 1/(no of questions per sec)
binsizes=[1/4,1/5,1/4,1/9,1/8,1/5]

ax=ax.flatten()
i=0

for feature in attitude:
    bins=numpy.arange(min(data[feature]), max(data[feature]) +binsizes[i], binsizes[i])
    ax[i].hist(data=data.query('gender=="Female"'), x=feature, bins=bins,label='Female', alpha=.4)
    ax[i].hist(data=data.query('gender=="Male"'),x=feature, bins=bins,label='Male', alpha=.4)
    
    ax[i].set_xlabel('Score')##xlabel
    ax[i].set_ylabel('No of Students')##ylabel
    ax[i].set_title(feature)##title
    ax[i].legend()
    i+=1

plt.savefig('attitudes_gender.png')
In [32]:
##ploting the graph

fig, ax= plt.subplots(nrows=2,ncols=3,figsize=[20,12])


ax=ax.flatten()
i=0

for feature in attitude:
    plt.sca(ax[i])
    
    seaborn.scatterplot(x=data[feature],y=data['score_overall'],alpha=.05)
    ax[i].set_ylabel('Score: {}'.format('Overall Score'))##ylabel
    ax[i].set_xlabel('Attitude Score')##xlabel
    ax[i].set_title('Overall Score by {}'.format(feature.title()))##title
    i+=1
In [33]:
##analysing with the help of a heat map
seaborn.heatmap(data=data[['score_overall','motivation', 'anxiety','interest', 'work_ethic', 'behavior', 'self']].corr(),
           center=0, cmap="RdBu_r",annot=True, vmin=-1, vmax=1);
In [34]:
##Analysing with the help of scatter plot
seaborn.scatterplot(x=data['parents'],y=data['score_overall'],alpha=.05);
In [35]:
##To get the overall score
data[['parents','score_overall']].corr()
Out[35]:
parents score_overall
parents 1.000000 -0.011132
score_overall -0.011132 1.000000
In [36]:
##analysing using heatmap for parents view
seaborn.heatmap(data=data[attitude+['parents']].corr(), center=0, cmap="RdBu_r",annot=True, vmin=-1, vmax=1);

MOTIVATION SCORE BY GENDER ON THE BASIS OF COUNTRY

In [37]:
##getting the kdeplot for binsize of 1/4
n=seaborn.FacetGrid(data=data, col='country', hue='gender', col_wrap=4)
n.map(seaborn.kdeplot, 'motivation', bw=1/4)

for ax, c in zip(n.axes.flat, country):
    ax.axvline(x=data.query('country==@c').motivation.quantile(.5), color='green')
    ax.axvline(x=data.query('country==@c').motivation.quantile(.75), color='pink', alpha=.75)
    ax.legend()

ANXIETY SCORE BY GENDER ON THE BASIS OF COUNTRY

In [38]:
##getting kdeplot of binsize 1/5
n=seaborn.FacetGrid(data=data, col='country', hue='gender', col_wrap=4)
n.map(seaborn.kdeplot, 'anxiety', bw=1/5)

for ax, c in zip(n.axes.flat, country):
    ax.axvline(x=data.query('country==@c').anxiety.quantile(.5), color='green')
    ax.axvline(x=data.query('country==@c').anxiety.quantile(.75), color='violet', alpha=.75)
    ax.legend()

INTEREST SCORE BY GENDER ON THE BASIS OF COUNTRY

In [39]:
## getting kdeplot
n=seaborn.FacetGrid(data=data, col='country', hue='gender', col_wrap=4)
n.map(seaborn.kdeplot, 'interest', bw=1/4)

for ax, c in zip(n.axes.flat, country):
    ax.axvline(x=data.query('country==@c').interest.quantile(.5), color='yellow')
    ax.axvline(x=data.query('country==@c').interest.quantile(.75), color='green', alpha=.75)
    ax.legend()

WORK ETHIC SCORE BY GENDER ON THE BASIS OF COUNTRY

In [40]:
##plotting the kdeplot
n=seaborn.FacetGrid(data=data, col='country', hue='gender', col_wrap=4)
n.map(seaborn.kdeplot, 'work_ethic', bw=1/4)

for ax, c in zip(n.axes.flat, country):
    ax.axvline(x=data.query('country==@c').work_ethic.quantile(.5), color='red')
    ax.axvline(x=data.query('country==@c').work_ethic.quantile(.75), color='black', alpha=.75)
    ax.legend()

BEHAVIOUR SCORE BY GENDER ON THE BASIS OF COUNTRY

In [41]:
##plotting the kdeplot
n=seaborn.FacetGrid(data=data, col='country', hue='gender', col_wrap=4)
n.map(seaborn.kdeplot, 'behavior', bw=.125)

for ax, c in zip(n.axes.flat, country):
    ax.axvline(x=data.query('country==@c').behavior.quantile(.5), color='yellow')
    ax.axvline(x=data.query('country==@c').behavior.quantile(.75), color='red', alpha=.75)
    ax.legend()

SELF SCORE BY GENDER ON BASIS OF COUNTRY

In [42]:
##plotting kdeplot
n=seaborn.FacetGrid(data=data, col='country', hue='gender', col_wrap=4)
n.map(seaborn.kdeplot, 'self', bw=1/5)

for ax, c in zip(n.axes.flat, country):
    ax.axvline(x=data.query('country==@c').self.quantile(.5), color='green')
    ax.axvline(x=data.query('country==@c').self.quantile(.75), color='black', alpha=.75)
    ax.legend()

PARENTAL SCORE BY GENDER BASED ON COUNTRY

In [43]:
##getting kdeplot
n=seaborn.FacetGrid(data=data, col='country', hue='gender', col_wrap=4)
n.map(seaborn.kdeplot, 'parents', bw=1/3)

for ax, c in zip(n.axes.flat, country):
    ax.axvline(x=data.query('country==@c').parents.quantile(.5), color='green')
    ax.axvline(x=data.query('country==@c').parents.quantile(.75), color='violet', alpha=.75)
    ax.legend()

LET'S GO AHEAD AND TAKE A LOOK AT THE CHANGES IN OUR DATA AS THE TOP PERCENTILE UNDER OVERALL SCORE INCREASES

In [44]:
##getting the data frame
datae=pandas.DataFrame(columns=['nth_percentile','m_prop','f_prop',
                         'avg_motivation','avg_anxiety','avg_interest','avg_work_ethic','avg_parents','avg_behavior','avg_self',
                        'm_motivation','m_anxiety','m_interest','m_work_ethic','m_parents','m_behavior','m_self',
                        'f_motivation','f_anxiety','f_interest','f_work_ethic','f_parents','f_behavior','f_self'])
In [45]:
##for loop is used to get the required datas
for i in range(100):
    n=i*.01
    
    s=data.score_overall.quantile(n)
    datatop=data.query('score_overall>=@s')
    
    mprop= datatop.gender.value_counts()['Male']/datatop.gender.value_counts().sum()
    fprop= datatop.gender.value_counts()['Female']/datatop.gender.value_counts().sum()    
    
    motivation=datatop.motivation.mean()
    anxiety=datatop.anxiety.mean()
    interest=datatop.interest.mean()
    work_ethic=datatop.work_ethic.mean()
    parents=datatop.parents.mean()
    behavior= datatop.behavior.mean()
    self=datatop.self.mean()
    
    dataf=datatop.query('gender=="Female"')
    fmotivation=dataf.motivation.mean()
    fanxiety=dataf.anxiety.mean()
    finterest=dataf.interest.mean()
    fwork_ethic=dataf.work_ethic.mean()
    fparents=dataf.parents.mean()
    fbehavior= dataf.behavior.mean()
    fself=dataf.self.mean()
    
    datam=datatop.query('gender=="Male"')
    mmotivation=datam.motivation.mean()
    manxiety=datam.anxiety.mean()
    minterest=datam.interest.mean()
    mworkethic=datam.work_ethic.mean()
    mparents=datam.parents.mean()
    mbehavior= datam.behavior.mean()
    mself=datam.self.mean()
    
    datae=datae.append({'nth_percentile':n,'mprop':mprop,'fprop':fprop,
                  'avg_motivation':motivation,'avg_anxiety':anxiety,'avg_interest':interest,
                  'avg_work_ethic':work_ethic,'avg_parents':parents,'avg_behavior':behavior,'avg_self':self,
                  'm_motivation':mmotivation,'m_anxiety':manxiety,'m_interest':minterest,
                  'm_work_ethic':mworkethic,'m_parents': mparents,'m_behavior':mbehavior,'m_self':mself,
                  'f_motivation':fmotivation,'f_anxiety':fanxiety,'f_interest':finterest,
                  'f_work_ethic':fwork_ethic,'f_parents':fparents,'f_behavior':fbehavior,'f_self':fbehavior},
                 ignore_index=True)
In [46]:
##getting the top part
datae.head()
Out[46]:
nth_percentile m_prop f_prop avg_motivation avg_anxiety avg_interest avg_work_ethic avg_parents avg_behavior avg_self ... m_self f_motivation f_anxiety f_interest f_work_ethic f_parents f_behavior f_self fprop mprop
0 0.00 NaN NaN 2.988886 2.495314 2.445061 2.866265 3.066485 1.677428 2.482890 ... 2.598975 2.940772 2.577510 2.380483 2.900185 3.036302 1.599006 1.599006 0.504777 0.495223
1 0.01 NaN NaN 2.988211 2.492472 2.443034 2.865957 3.065976 1.674171 2.483480 ... 2.599340 2.939370 2.574964 2.378237 2.900086 3.035496 1.596455 1.596455 0.504963 0.495037
2 0.02 NaN NaN 2.987725 2.489371 2.441164 2.865658 3.065535 1.670887 2.484580 ... 2.600384 2.938198 2.572108 2.376208 2.900016 3.034598 1.593596 1.593596 0.504928 0.495072
3 0.03 NaN NaN 2.986818 2.485832 2.439023 2.865128 3.064948 1.667679 2.485826 ... 2.601548 2.936801 2.568881 2.374099 2.899889 3.033691 1.591236 1.591236 0.504819 0.495181
4 0.04 NaN NaN 2.986120 2.482316 2.437157 2.864962 3.064312 1.665027 2.487382 ... 2.602896 2.935303 2.565335 2.372029 2.900093 3.032779 1.589037 1.589037 0.504468 0.495532

5 rows × 26 columns

In [50]:
##getting a 2 lines lineplot
plt.figure(figsize=(10,8))
seaborn.lineplot(data=datae,x='nth_percentile',y='fprop', label='Female')
seaborn.lineplot(data=datae,x='nth_percentile',y='mprop',label='Male')
plt.legend()
plt.title('GENDER PROPORTION OF STUDENTS WITH TOP PERCENTILE')
plt.ylabel('proportion')
plt.xlabel('nth percentile of overall math score');
In [51]:
#getting nth percintile of overall math score using lineplot
seaborn.lineplot(data=datae,x='nth_percentile',y='f_motivation', label='Female')
seaborn.lineplot(data=datae,x='nth_percentile',y='m_motivation', label='Male')
seaborn.lineplot(data=datae,x='nth_percentile',y='avg_motivation', label='Average Total')
plt.legend()
plt.title('AVERAGE MOTIVATION SCORE OF STUDENTS WITH TOP PERCENTILE')##title
plt.ylim(bottom=1)#3limit
plt.ylabel('motivation score')##ylabel
plt.xlabel('nth percentile with overall math score');##xlabel
In [52]:
##To get average anxiety score using lineplot
seaborn.lineplot(data=datae,x='nth_percentile',y='f_anxiety', label='Female')
seaborn.lineplot(data=datae,x='nth_percentile',y='m_anxiety', label='Male')
seaborn.lineplot(data=datae,x='nth_percentile',y='avg_anxiety', label='Average Total')
plt.legend()
plt.title('AVERAGE ANXIETY SCORE OF STUDENTS WITH TOP PERCENTILE')##TITLE
plt.ylim(bottom=1)##limit
plt.ylabel('anxiety score')##ylabel
plt.xlabel('nth percentile of overall math score');##xlabel
In [53]:
##To get average interest score using line plot
seaborn.lineplot(data=datae,x='nth_percentile',y='f_interest', label='Female')
seaborn.lineplot(data=datae,x='nth_percentile',y='m_interest', label='Male')
seaborn.lineplot(data=datae,x='nth_percentile',y='avg_interest', label='Average Total')
plt.legend()
plt.title('AVERAGE INTEREST SCORE OF STUDENTS WITH  TOP PERCENTILE')##title
plt.ylim(bottom=1)#limit
plt.ylabel('interest score')##ylabel
plt.xlabel('nth percentile of overall math score');##xlabel
In [55]:
##using two graphs to get a closer view to get average work ethic score
plt.figure(figsize=(20,8))

plt.subplot(1,2,1)
seaborn.lineplot(data=datae,x='nth_percentile',y='f_work_ethic', label='Female')
seaborn.lineplot(data=datae,x='nth_percentile',y='m_work_ethic', label='Male')
seaborn.lineplot(data=datae,x='nth_percentile',y='avg_work_ethic', label='Average Total')
plt.legend()
plt.title('AVERAGE WORK ETHIC SCORE OF STUDENTS WITH TOP PERCENTILE')#title
plt.ylim(bottom=1)##limit
plt.ylabel('work ethic score')##ylabel
plt.xlabel('nth percentile of overall math score')##xlabel

plt.subplot(1,2,2)
seaborn.lineplot(data=datae,x='nth_percentile',y='f_work_ethic', label='Female')
seaborn.lineplot(data=datae,x='nth_percentile',y='m_work_ethic', label='Male')
seaborn.lineplot(data=datae,x='nth_percentile',y='avg_work_ethic', label='Average Total')
plt.legend()
plt.title(' AVERAGE WORK ETHIC SCORE OF STUDENTS WITH TOP PERCENTILE :CLOSER VIEW')##title
plt.ylabel('work ethic score')##ylabel
plt.xlabel('nth percentile of overall math score');##xlabel
In [56]:
##using two graphs to get a closer view to get average parents score
plt.figure(figsize=(20,8))

plt.subplot(1,2,1)
seaborn.lineplot(data=datae,x='nth_percentile',y='f_parents', label='Female')
seaborn.lineplot(data=datae,x='nth_percentile',y='m_parents', label='Male')
seaborn.lineplot(data=datae,x='nth_percentile',y='avg_parents', label='Average Total')
plt.legend()
plt.title('AVERAGE PARENTS SCORE FOR STUDENTS WITH TOP PERCENTILE')##title
plt.ylim(bottom=1)##limit
plt.ylabel('parents score')##ylabel
plt.xlabel('nth percentile of overall math score')##xlabel

plt.subplot(1,2,2)
seaborn.lineplot(data=datae,x='nth_percentile',y='f_parents', label='Female')
seaborn.lineplot(data=datae,x='nth_percentile',y='m_parents', label='Male')
seaborn.lineplot(data=datae,x='nth_percentile',y='avg_parents', label='Average Total')
plt.legend()
plt.title(' AVERAGE PARENTS SCORE FOR STUDENTS WITH TOP PERCENTILE : CLOSER VIEW')##title
plt.ylabel('parent score')##YLABEL
plt.xlabel('nth percentile of overall math score');##XLABEL
In [57]:
##using lineplot to get average behaviour score 
seaborn.lineplot(data=datae,x='nth_percentile',y='f_behavior', label='Female')
seaborn.lineplot(data=datae,x='nth_percentile',y='m_behavior', label='Male')
seaborn.lineplot(data=datae,x='nth_percentile',y='avg_behavior', label='Average Total')
plt.legend()
plt.title('AVERAGE BEHAVIOUR SCORE OF STUDENTS WITH TOP PERCENTILE')##TITLE
plt.ylim(bottom=1)##limit
plt.ylabel('behavior score')##ylabel
plt.xlabel('nth percentile of overall math score');##xlabel
In [58]:
##using lineplot to get average self score 
seaborn.lineplot(data=datae,x='nth_percentile',y='f_self', label='Female')
seaborn.lineplot(data=datae,x='nth_percentile',y='m_self', label='Male')
seaborn.lineplot(data=datae,x='nth_percentile',y='avg_self', label='Average Total')
plt.legend()
plt.title('AVERAGE SELF SCORE OF STUDENTS WITH TOP PERCENTILE')##title
plt.ylim(bottom=1)##limit
plt.ylabel('self score')##ylabel
plt.xlabel('nth percentile of overall math score');##xlabel

After these analysis we can see that the average self score for males is consistently higher that the typical self score for females. We can notice how there's an outsized gap between how females feel about themselves regarding their math skills as compared to how males feel about themselves in their math skills.

In [ ]: